This R notebook enables the reproduction of the analysis described in:
Tracing production instability in a clonally-derived CHO cell line using single cell transcriptomics
In this study, the four biological replicates of a clonally derived CHO cell line were sampled at 72hrs post-seeding and analysed on the BD Rhapsody system whole transcriptome analysis. The FASTQ data were converted to a cell gene count matrix using the Seven Bridges Genomics BD Rhapsody WTA pipeline and combined into a single sample.
SRA ID: PRJNA661407
library(monocle)
library(plyr)
library(dplyr)
library(ggExtra)
library(reshape)
library(colorRamps)
library(stringr)
library(Seurat)
library(BiocParallel)
library(ggpubr)
library(DT)
library(WebGestaltR)
library(ggalt)
library(biomaRt)
library(patchwork)
library(viridis)
library(gridExtra)
library(grid)
library(gganimate)
library(hrbrthemes)
library(jcolors)
library(cowplot)
library(data.table)
library(GGally)
library(ggplotify)
library(stringr)
library(writexl)
Create a function to determine the relative expression of a gene from a Monocle cell dataset. When the expression of a gene is than the detection limit, the gene is equal to the detection limit.
We import data from the Seven Bridges Genomics (SBG) platform for analysis in R. The UMI data for each replicate is stored in a separate CSV file with each row representing an individual cell, the column represents a gene. In this experiment, 4 biological replicates for the CHOK1 DP12 cell line were analysed using Rhapsody WTA. We create a batch ID than be used to regress any technical variation due to batch. We determine the genes detected in at least one replicate. If a gene is not detected in a replicate we assign a 0 UMI value for that replicate. The replicates are then combined in a single matrix.
The primary analysis package for this analysis Monocle v2. To use this package we must first create a cell dataset (CDS) object containing the raw Rhapsody data. During this process we assign a detection limit along with the expressionFamily - in this case negbinomial as appropriate for UMI data.
[1] "Raw data for 4673 and 20594 genes captured"
To compare a matched bulkRNASeq dataset to our single cell data were calculated a gene-level transcripts per million (TPM) value using Kallisto.
To compare to the bulk RNASeq data we merge the scRNASeq to pseudoBulk profiles, 1 for each replicate and convert expression to TPM. We then determine the agreement between the replicates
Determine the agreement between the bulkRNASeq replicates # scRNASeq v bulkRNASeq Comparison of each scRNASeq sample to it’s respective matched bulkRNASeq sample.
The first stage of preprocessing in the Monocle workflow aims to eliminate poor quality cells from further analysis. In this section we calculate metrics to enable filtering of poor quality cells from the analysis - total UMIs, and fraction of UMIs that originate from mitochondrial genes.
Figure
Figure
Figure
Figure
The SBG platform utilises the ENSEMBL GTF file for the PICR genome. To improve the annotation of the genes we utlise the corresponding Entrez IDs for those genes labelled with and “ENSCGRG” IDs to obtain a gene symbol
For the next section we used the cell cycle scoring function of Seurat to determine which phase of the cell cycles cells were in. Using this function will allow us to correct the effect of cell cycle in subsequent analyses.
Before scoring cells by the cell cycle phase any variation due to batch is removed
Here we use custom CHO cell genes involved in cell cycle
Figure
Figure
Distance cutoff calculated to 3.224882
## Effect of CC correction
Figure
Using Monocle’s differential gene test the genes that differ between the clusters of cells are identified. We again regress out confounding factors i.e. batch and cell cycle phase
462.535 sec elapsed
## scRNASeq
[1] "Heavy chain DE q-value for cluster = 1.6e-53"
## HC cluster expression
[1] "Light chain DE q-values for cluster = 1.3e-36"
Figure
Figure
Create and save figure 5 in the manuscript.
Cell state 1 is set as the root or beginning of the trajectory ## Pseudotime differential expression Identification of genes that correlated with Pseudotime. Those genes found to have a qval < 0.01 using the Monocle
differentialGeneTest function were considered significant
[1] TRUE
Use WebGestaltR to determine if the identified clusters of genes are enriched when compared to GO. Those biological processes with a Benjamini Hochberg adjusted p-value < 0.05 were considered significantly enriched.
Overlay the expression of genes found to significantly correlate with the progression of cells along the trajectory.
Plot the expression of the Prosaposin gene ### Hmox1 Plot the expression of the Heme oxygenase 1 gene
### Fth1 Plot the expression of the Ferritin 1 heavy chain 1 gene
Create and save figure 6 in the manuscript.
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.0.2 (2020-06-22)
os Ubuntu 16.04.5 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate en_IE.UTF-8
ctype en_IE.UTF-8
tz Europe/Dublin
date 2020-10-05
─ Packages ───────────────────────────────────────────────────────────────────
! package * version date lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.0.2)
AnnotationDbi 1.50.1 2020-06-29 [1] Bioconductor
apcluster 1.4.8 2019-08-21 [1] CRAN (R 4.0.2)
ape 5.4 2020-06-03 [1] CRAN (R 4.0.2)
ash 1.0-15 2015-09-01 [1] CRAN (R 4.0.2)
askpass 1.1 2019-01-13 [1] CRAN (R 4.0.2)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
backports 1.1.8 2020-06-17 [1] CRAN (R 4.0.2)
Biobase * 2.48.0 2020-04-27 [1] Bioconductor
BiocFileCache 1.12.0 2020-04-27 [1] Bioconductor
BiocGenerics * 0.34.0 2020-04-27 [1] Bioconductor
BiocManager 1.30.10 2019-11-16 [1] CRAN (R 4.0.2)
BiocParallel * 1.22.0 2020-04-27 [1] Bioconductor
biomaRt * 2.44.1 2020-06-17 [1] Bioconductor
bit 1.1-15.2 2020-02-10 [1] CRAN (R 4.0.2)
bit64 0.9-7 2017-05-08 [1] CRAN (R 4.0.2)
blob 1.2.1 2020-01-20 [1] CRAN (R 4.0.2)
bookdown 0.20 2020-06-23 [1] CRAN (R 4.0.2)
broom 0.7.0 2020-07-09 [1] CRAN (R 4.0.2)
callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.2)
car 3.0-8 2020-05-21 [1] CRAN (R 4.0.2)
carData 3.0-4 2020-05-22 [1] CRAN (R 4.0.2)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.2)
cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.2)
P cluster 2.1.0 2019-06-19 [?] CRAN (R 4.0.0)
P codetools 0.2-16 2018-12-24 [?] CRAN (R 4.0.0)
colorRamps * 2.3 2012-10-29 [1] CRAN (R 4.0.2)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.2)
combinat 0.0-8 2012-10-29 [1] CRAN (R 4.0.2)
cowplot * 1.0.0 2019-07-11 [1] CRAN (R 4.0.2)
cpp11 0.2.1 2020-08-11 [1] CRAN (R 4.0.2)
crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
crosstalk 1.1.0.1 2020-03-13 [1] CRAN (R 4.0.2)
curl 4.3 2019-12-02 [1] CRAN (R 4.0.2)
data.table * 1.12.8 2019-12-09 [1] CRAN (R 4.0.2)
DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.2)
dbplyr 1.4.4 2020-05-27 [1] CRAN (R 4.0.2)
DDRTree * 0.1.5 2017-04-30 [1] CRAN (R 4.0.2)
densityClust 0.3 2017-10-24 [1] CRAN (R 4.0.2)
desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.2)
digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.2)
docopt 0.7.1 2020-06-24 [1] CRAN (R 4.0.2)
doParallel 1.0.15 2019-08-02 [1] CRAN (R 4.0.2)
doRNG 1.8.2 2020-01-27 [1] CRAN (R 4.0.2)
dplyr * 1.0.0 2020-05-29 [1] CRAN (R 4.0.2)
DT * 0.14 2020-06-24 [1] CRAN (R 4.0.2)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2)
extrafont 0.17 2014-12-08 [1] CRAN (R 4.0.2)
extrafontdb 1.0 2012-06-11 [1] CRAN (R 4.0.2)
fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
farver 2.0.3 2020-01-16 [1] CRAN (R 4.0.2)
fastICA 1.2-2 2019-07-08 [1] CRAN (R 4.0.2)
fastmap 1.0.1 2019-10-08 [1] CRAN (R 4.0.2)
fitdistrplus 1.1-1 2020-05-19 [1] CRAN (R 4.0.2)
FNN 1.1.3 2019-02-15 [1] CRAN (R 4.0.2)
forcats 0.5.0 2020-03-01 [1] CRAN (R 4.0.2)
foreach 1.5.0 2020-03-30 [1] CRAN (R 4.0.2)
P foreign 0.8-80 2020-05-24 [?] CRAN (R 4.0.2)
formatR 1.7 2019-06-11 [1] CRAN (R 4.0.2)
fs 1.4.2 2020-06-30 [1] CRAN (R 4.0.2)
future 1.18.0 2020-07-09 [1] CRAN (R 4.0.2)
future.apply 1.6.0 2020-07-01 [1] CRAN (R 4.0.2)
gdtools 0.2.2 2020-04-03 [1] CRAN (R 4.0.2)
generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.2)
GGally * 2.0.0 2020-06-06 [1] CRAN (R 4.0.2)
ggalt * 0.4.0 2017-02-15 [1] CRAN (R 4.0.2)
gganimate * 1.0.6 2020-07-08 [1] CRAN (R 4.0.2)
ggExtra * 0.9 2019-08-27 [1] CRAN (R 4.0.2)
ggplot2 * 3.3.2 2020-06-19 [1] CRAN (R 4.0.2)
ggplotify * 0.0.5 2020-03-12 [1] CRAN (R 4.0.2)
ggpubr * 0.4.0 2020-06-27 [1] CRAN (R 4.0.2)
ggrepel 0.8.2 2020-03-08 [1] CRAN (R 4.0.2)
ggridges 0.5.2 2020-01-12 [1] CRAN (R 4.0.2)
ggsignif 0.6.0 2019-08-08 [1] CRAN (R 4.0.2)
globals 0.12.5 2019-12-07 [1] CRAN (R 4.0.2)
glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.2)
gridExtra * 2.3 2017-09-09 [1] CRAN (R 4.0.2)
gridGraphics 0.5-0 2020-02-25 [1] CRAN (R 4.0.2)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2)
haven 2.3.1 2020-06-01 [1] CRAN (R 4.0.2)
highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.2)
hrbrthemes * 0.8.0 2020-03-06 [1] CRAN (R 4.0.2)
HSMMSingleCell 1.8.0 2020-05-07 [1] Bioconductor
htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
htmlwidgets 1.5.1 2019-10-08 [1] CRAN (R 4.0.2)
httpuv 1.5.4 2020-06-06 [1] CRAN (R 4.0.2)
httr 1.4.1 2019-08-05 [1] CRAN (R 4.0.2)
ica 1.0-2 2018-05-24 [1] CRAN (R 4.0.2)
igraph 1.2.5 2020-03-19 [1] CRAN (R 4.0.2)
IRanges 2.22.2 2020-05-21 [1] Bioconductor
irlba * 2.3.3 2019-02-05 [1] CRAN (R 4.0.2)
iterators 1.0.12 2019-07-26 [1] CRAN (R 4.0.2)
jcolors * 0.0.4 2019-05-22 [1] CRAN (R 4.0.2)
jsonlite 1.7.0 2020-06-25 [1] CRAN (R 4.0.2)
P KernSmooth 2.23-17 2020-04-26 [?] CRAN (R 4.0.0)
knitr * 1.29 2020-06-23 [1] CRAN (R 4.0.2)
labeling 0.3 2014-08-23 [1] CRAN (R 4.0.2)
later 1.1.0.1 2020-06-05 [1] CRAN (R 4.0.2)
P lattice 0.20-41 2020-04-02 [?] CRAN (R 4.0.0)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.2)
leiden 0.3.3 2020-02-04 [1] CRAN (R 4.0.2)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2)
limma 3.44.3 2020-06-12 [1] Bioconductor
listenv 0.8.0 2019-12-05 [1] CRAN (R 4.0.2)
lmtest 0.9-37 2019-04-30 [1] CRAN (R 4.0.2)
magick 2.4.0 2020-06-23 [1] CRAN (R 4.0.2)
magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
maps 3.3.0 2018-04-03 [1] CRAN (R 4.0.2)
P MASS 7.3-51.6 2020-04-26 [?] CRAN (R 4.0.0)
P Matrix * 1.2-18 2019-11-27 [?] CRAN (R 4.0.0)
matrixStats 0.56.0 2020-03-13 [1] CRAN (R 4.0.2)
memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
mime 0.9 2020-02-04 [1] CRAN (R 4.0.2)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.0.2)
monocle * 2.16.0 2020-04-27 [1] Bioconductor
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2)
P nlme 3.1-148 2020-05-24 [?] CRAN (R 4.0.2)
openssl 1.4.2 2020-06-27 [1] CRAN (R 4.0.2)
openxlsx 4.1.5 2020-05-06 [1] CRAN (R 4.0.2)
packrat 0.5.0 2018-11-14 [1] CRAN (R 4.0.2)
patchwork * 1.0.1 2020-06-22 [1] CRAN (R 4.0.2)
pbapply 1.4-2 2019-08-31 [1] CRAN (R 4.0.2)
pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.0.2)
pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.2)
pkgbuild 1.0.8 2020-05-07 [1] CRAN (R 4.0.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
plotly 4.9.2.1 2020-04-04 [1] CRAN (R 4.0.2)
plyr * 1.8.6 2020-03-03 [1] CRAN (R 4.0.2)
png 0.1-7 2013-12-03 [1] CRAN (R 4.0.2)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
processx 3.4.3 2020-07-05 [1] CRAN (R 4.0.2)
progress 1.2.2 2019-05-16 [1] CRAN (R 4.0.2)
proj4 1.0-10 2020-03-02 [1] CRAN (R 4.0.2)
promises 1.1.1 2020-06-09 [1] CRAN (R 4.0.2)
proxy 0.4-24 2020-04-25 [1] CRAN (R 4.0.2)
ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.2)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
qlcMatrix 0.9.7 2018-04-20 [1] CRAN (R 4.0.2)
R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.2)
RANN 2.6.1 2019-01-08 [1] CRAN (R 4.0.2)
rappdirs 0.3.1 2016-03-28 [1] CRAN (R 4.0.2)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.2)
Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2)
RcppAnnoy 0.0.16 2020-03-08 [1] CRAN (R 4.0.2)
readr 1.3.1 2018-12-21 [1] CRAN (R 4.0.2)
readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.2)
remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.2)
reshape * 0.8.8 2018-10-23 [1] CRAN (R 4.0.2)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.0.2)
reticulate 1.16 2020-05-27 [1] CRAN (R 4.0.2)
rio 0.5.16 2018-11-26 [1] CRAN (R 4.0.2)
rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.2)
rmdformats * 0.3.7 2020-03-11 [1] CRAN (R 4.0.2)
rngtools 1.5 2020-01-23 [1] CRAN (R 4.0.2)
ROCR 1.0-11 2020-05-02 [1] CRAN (R 4.0.2)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
RSQLite 2.2.0 2020-01-07 [1] CRAN (R 4.0.2)
rstatix 0.6.0 2020-06-18 [1] CRAN (R 4.0.2)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.2)
rsvd 1.0.3 2020-02-17 [1] CRAN (R 4.0.2)
Rtsne 0.15 2018-11-10 [1] CRAN (R 4.0.2)
Rttf2pt1 1.3.8 2020-01-10 [1] CRAN (R 4.0.2)
rvcheck 0.1.8 2020-03-01 [1] CRAN (R 4.0.2)
S4Vectors 0.26.1 2020-05-16 [1] Bioconductor
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2)
sctransform 0.2.1 2019-12-17 [1] CRAN (R 4.0.2)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
Seurat * 3.1.5 2020-04-16 [1] CRAN (R 4.0.2)
shiny 1.5.0 2020-06-23 [1] CRAN (R 4.0.2)
slam 0.1-47 2019-12-21 [1] CRAN (R 4.0.2)
sparsesvd 0.2 2019-07-15 [1] CRAN (R 4.0.2)
stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.2)
stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
P survival 3.2-3 2020-06-13 [?] CRAN (R 4.0.1)
svglite 1.2.3.2 2020-07-07 [1] CRAN (R 4.0.2)
systemfonts 0.3.0 2020-09-01 [1] CRAN (R 4.0.2)
testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.2)
tibble 3.0.3 2020-07-10 [1] CRAN (R 4.0.2)
tictoc 1.0 2014-06-17 [1] CRAN (R 4.0.2)
tidyr 1.1.0 2020-05-20 [1] CRAN (R 4.0.2)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2)
tsne 0.1-3 2016-07-15 [1] CRAN (R 4.0.2)
tweenr 1.0.1 2018-12-14 [1] CRAN (R 4.0.2)
usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.2)
uwot 0.1.8 2020-03-16 [1] CRAN (R 4.0.2)
vctrs 0.3.1 2020-06-05 [1] CRAN (R 4.0.2)
VGAM * 1.1-3 2020-04-28 [1] CRAN (R 4.0.2)
viridis * 0.5.1 2018-03-29 [1] CRAN (R 4.0.2)
viridisLite * 0.3.0 2018-02-01 [1] CRAN (R 4.0.2)
WebGestaltR * 0.4.3 2020-01-16 [1] CRAN (R 4.0.2)
whisker 0.4 2019-08-28 [1] CRAN (R 4.0.2)
withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.2)
writexl * 1.3 2020-05-05 [1] CRAN (R 4.0.2)
xfun 0.15 2020-06-21 [1] CRAN (R 4.0.2)
XML 3.99-0.4 2020-07-05 [1] CRAN (R 4.0.2)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.2)
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
zip 2.0.4 2019-09-01 [1] CRAN (R 4.0.2)
zoo 1.8-8 2020-05-02 [1] CRAN (R 4.0.2)
[1] /mnt/HDD2/colin/CHO_cell_scRNASeq/packrat/lib/x86_64-pc-linux-gnu/4.0.2
[2] /mnt/HDD2/colin/CHO_cell_scRNASeq/packrat/lib-ext/x86_64-pc-linux-gnu/4.0.2
[3] /mnt/HDD2/colin/CHO_cell_scRNASeq/packrat/lib-R/x86_64-pc-linux-gnu/4.0.2
P ── Loaded and on-disk path mismatch.